{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# 逻辑分类LC"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "首先介绍一下线性模型。\n",
    "\n",
    "线性模型：**把每个特征对分类结果的“作用”加起来——这就是线性模型。**"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "逻辑分类（Logistic Classification）是一种线性模型，可以表示为$y =f(x*w+b)$，其中w是训练得到的权重参数（Weight）；\n",
    "\n",
    "x是样本特征数据；b是偏置（Bias），f成为激活函数。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "就是给定一批样本数据集，和样本对象所属的分类，进行建立模型。使用模型对新来的对象对象分类预测。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "值得注意的是，逻辑分类中，我们想要的结果为分类类型，而不再是线性回归中的数值。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "> 注意：在逻辑分类中，我们想要的结果是对象所属的分类，比如A类、B类，属于标称型数据。\n",
    "\n",
    ">但是在数学中，一般没办法使用A、B表示，而使用0、1、2、3表示每种类别。注意这里的0、1、2、3只是表示对应的标称分类，并不表示具体的数值。\n",
    "\n",
    ">也就是说，并不能表述为1比0大，就像不能表述为B类比A类大。也不能表述为1比3更靠近0，就像不能表述为B类比C类更靠近A类。必能在逻辑分类的学习中，一定要注意把握这一点。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "既然逻辑分类中，我们想要的输出结果不再是数值型的，但是在线性回归中我们学习的都是计算数值型的结果，那怎么应用呢？"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们可以使用一种成为**one-hot编码**的方式，将分类编码成数值。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "比如有4个分类0、1、2、3。我们可以将"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "分类0编码成[1,0,0,0]  第一个分量的值为1，其他为0，表示为对象属于第一个分类的概率为1，属于其他分类的概率为0。\n",
    "\n",
    "分类1编码成[0,1,0,0]  第二个分量的值为1，其他为0，表示为对象属于第二个分类的概率为1，属于其他分类的概率为0。\n",
    "\n",
    "分类2编码成[0,0,1,0]  第三个分量的值为1，其他为0，表示为对象属于第三个分类的概率为1，属于其他分类的概率为0。\n",
    "\n",
    "分类3编码成[0,0,0,1]  第四个分量的值为1，其他为0，表示为对象属于第四个分类的概率为1，属于其他分类的概率为0。\n",
    "\n",
    "而分类概率是数值型的，也就是说我们预测的结果就变成了四维数值型的。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 重点来了"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "所以在建立逻辑分类模型前必须先将标称分类，转化为数值型的one-hot编码才能运用线性模型进行数值运算。\n",
    "\n",
    "也就是说逻辑分类一定是多输出的。只不过你在拿到样本数据集时样本数据的结果都是标称型的分类，这种数据集是不能应用于逻辑分类建模的。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "比如下面是你拿到的数据集。第一列为属性x1，第二列为属性x2，第三列为分类结果，包含0,1,2三种分类"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "```\n",
    "x1\t\tx2\t\ty\n",
    "0.1\t\t0.2\t\t0\n",
    "0.12\t0.32\t0\n",
    "0.21\t0.31\t1\n",
    "0.25\t0.15\t2\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "必须先将数据集转化为下面的模式。x0为常值0，对象的回归系数表示偏量值b，标称型分为转化为数值型概率。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "```\n",
    "x0\t\tx1\t\tx2\t\ty1\t\ty2\t\ty3\n",
    "1\t\t0.1\t\t0.2\t\t1\t\t0\t\t0\n",
    "1\t\t0.12\t0.32\t1\t\t0\t\t0\n",
    "1\t\t0.21\t0.31\t0\t\t1\t\t0\n",
    "1\t\t0.25\t0.15\t0\t\t0\t\t1\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "这样我们要求的回归系数矩阵w也不再是线性回归中的向量，而是矩阵。\n",
    "\n",
    "比如\n",
    "有m行x，每一个x有n个属性，然后每个x有一个分类，也就有m行y，每个y可以被分成k个元素的向量，我需要先把m行y变成一个m行k列的矩阵，最后计算出一个（j+1）*k列的w矩阵。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 线性回归、线性分类、逻辑回归、逻辑分类的区别 ##\n",
    "\n",
    "**线性回归：**\n",
    "\n",
    "线性回归就是计算回归系数，通过回归系数线性组合属性预测数值结果。\n",
    "\n",
    "线性回归以误差平方和最小为目的，其实是在假定**误差**服从高斯分布\n",
    "\n",
    "> $f(x)=xw+b$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**二分类的线性分类：**\n",
    "\n",
    "在线性回归的基础上添加了大于和小于0的判断。当值大于0属于一种分类，当值小于0等于另一种分类。\n",
    "\n",
    "$$ f(n)= \\begin{cases} 1, & \\text {f(x) $\\geq$ 0} \\\\ -1, & \\text{f(x)<0} \\end{cases} $$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**逻辑回归：**\n",
    "\n",
    "逻辑回归是逻辑分类的一种。\n",
    "\n",
    "逻辑分类为\n",
    "\n",
    "> $y =f(x*w+b)$\n",
    "\n",
    "当f函数为sigmoid函数时，就是逻辑回归。即\n",
    "\n",
    "> $y =sigmoid(x*w+b)$\n",
    "\n",
    "逻辑回归就是在线性回归的基础上，再经过sigmoid这个非线性函数，将值转化为分类的概率。 逻辑回归实际上是采用的是 伯努利分布来分析**误差**。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**sigmoid函数：**\n",
    "\n",
    "> $sigmoid(x)=\\frac{1}{1+e^{-x}} $\n",
    "\n",
    "这个激活函数，一个输入值产生一个输出值，当输入为向量，就会每个分量计算一个输出值，再将所有的输出值组合成输出向量。\n",
    "\n",
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/c54510a503a382486e2470046fd1726a.gif)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**逻辑分类：**\n",
    "\n",
    "逻辑分类属于线性模型。\n",
    "\n",
    "逻辑分类包含逻辑回归、一般线性回归。\n",
    "\n",
    "逻辑回归为二分类问题，一般线性回归为多分类问题。\n",
    "\n",
    "二分类：在逻辑回归的基础上，判断分类概率是否大于0.5，大于则属于这个分类。\n",
    "\n",
    "$$ y= \\begin{cases} 1, & \\text {f(xw+b) $\\geq$ 0.5} \\\\ -1, & \\text{f(xw+b)<0.5} \\end{cases} $$\n",
    "\n",
    "而由于sigmoid函数是单调的，当sigmoid的输入为0时，输出为0.5，所以等价于逻辑回归二分类的判别为以下公式，这与线性分类的公式相同。\n",
    "\n",
    "> $xw+b=0$\n",
    "\n",
    "多分类：一般线性回归：\n",
    "\n",
    "高斯分布、伯努利分布、贝塔分布、迪特里特分布，都属于指数分布。\n",
    "\n",
    "所以一般线性回归就是用指数分布来处理噪声模型的方法。\n",
    "\n",
    "一般使用softmax函数来处理多分类问题。softmax回归是一般线性回归的一个例子。\n",
    "\n",
    "**$f(xw+b)=softmax(xw+b)$**"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**softmax函数**\n",
    "\n",
    "softmax函数其实就是一个归一化的指数函数\n",
    "\n",
    "**$ softmax(x) = \\frac{e^{x_i}}{\\sum_i e^{x_i}}$**\n",
    "\n",
    "这个激活函数，输入必须为一组向量，每个分量计算出一个比例值作为输出向量的分量。\n",
    "\n",
    "在使用一般线性回归进行分类中，计算哪个分类的概率大就属于哪个分类。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "\n",
    "> 注意：二分类逻辑回归的输出类别必须是0或1，建立模型时不需要对类标签进行one-hot编码，代表的结果为正样本的概率。\n",
    "\n",
    ">多分类一般线性回归，必须将类标签进行one-hot编码。逻辑回归和一般线性回归只能分类线性模型。\n",
    "\n",
    ">逻辑回归的每个样本的输出值为单个数值，表示正样本出现的概率，也就是类标号1出现的概率。一般逻辑回归的输出值为向量，表示每种类别出现的概率。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 逻辑回归的损失函数\n",
    "\n",
    "我们知道在线性回归中使用误差平方和作为损失函数，求解使损失函数最小的w是线性回归的目的。\n",
    "\n",
    "在统计学上，衡量两个概率的分布向量的差异程度，叫做交叉熵（Cross Entropy）。\n",
    "\n",
    "最大似然估计，是模型w必须让已出现样本出现的概率最大化。而已出现样本出现的概率为连乘积的形式，所以先要取log形式，转化为加法形式的对数似然函数。对数似然函数前面加一个负数系数，则最大化对数自然函数，等价于最小化损失函数。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "交叉熵函数如下：\n",
    "\n",
    "$D(y,p)=y log(p)+(1-y)log(1-p)$  其中$p=sigmoid(xw+b)$\n",
    "\n",
    "逻辑分类中的损失函数为\n",
    "\n",
    "$ loss(w,b)=-\\frac{1}{m} \\sum_{i=1}^m D(y_i,p_i)$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 逻辑回归损失函数的来源\n",
    "\n",
    "注意：**在二分类器中，取值结果类标号0、1，认为是正样本的输出概率，二分类中不需要进行one-hot编码，所以每个样本计算的值为一个数值。**\n",
    "\n",
    "在多分类器中，要进行one-hot编码才能进行推导，每个样本计算的值为一个向量。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "对于二分类，类标号为0-1，可以理解为正样本的输出概率。而由于类标号只能取0、1，取值结果是一个二项分布，概率函数服从伯努利分布。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "sigmoid函数的输出值，表示样本为正样本的概率值。那么对于分类1和0的概率分别为：\n",
    "\n",
    "$$P(y=1|x;\\theta )=h_{\\theta }(x)\\\\P(y=0|x;\\theta )=1-h_{\\theta }(x)$$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "将上面两个公式写成一个公式为： \n",
    "\n",
    "$$P(y|x;\\theta )=(h_{\\theta }(x))^{y}((1-h_{\\theta }(x)))^{1-y}$$\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "预测样本x的结果为真实结果y的概率为上面的公式。那么预测一批样本，预测结果全部都和真实结果一样的概率为下面的公式。\n",
    "\n",
    "这也是最大似然的思想，逻辑回归似然函数： \n",
    "\n",
    "$$L(\\theta )=\\prod_{i=1}^{m}P(y_{i}|x_{i};\\theta )=\\prod_{i=1}^{m}(h_{\\theta }(x_{i}))^{y_{i}}((1-h_{\\theta }(x_{i})))^{1-y_{i}}$$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "其中m表示样本数量。关于为什么最大似然函数就要取乘积，可以这样理解。\n",
    "\n",
    "最大似然就是让已经发生的事情（也就是样本）在你的模型中出现的概率尽可能大。\n",
    "\n",
    "也就是如果有一个和已知结果的样本特征一样的待测对象，那个你的预测结果也要和样本的真实结果尽可能一样。\n",
    "\n",
    "而每个样本可以理解为一个独立的事件。所有事件发生的概率尽可能大，就可以用联合概率表示所有样本都发生的概率。\n",
    "\n",
    "联合概率使用乘积的形式表示。这也就有了上述乘积的形式。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "取对数： \n",
    "\n",
    "$$l(\\theta )=logL(\\theta )=\\sum_{i=1}^{m}(y_{i}logh_{\\theta }(x_{i})+(1-y_{i})log(1-h_{\\theta }(x_{i})))$$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 逻辑分类器的构成\n",
    "\n",
    "\n",
    "> 在学习逻辑分类之前再强调一遍，**必须先将分类进行one-hot编码作为输出结果，后面的学习才能理解。**\n",
    "\n",
    "> 也就是说如果有3种分类，后面的教程每个样本对象的输出结果都是3维的，而不再是线性回归教程中的一维数据结果。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "线性回归为估值预测而生，逻辑分类为分类预测而生。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "下面为线性回归和逻辑分类的模型示例。\n",
    "\n",
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/7a59591072068091f88b1f2e8e2f8622.png)\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "逻辑分类是一个包含线性和非线性部分用来进行分类的分类器。我们以n分类为例进行分析。\n",
    "\n",
    "假设已经知道回归系数矩阵w和偏量b，逻辑分类包含下面的3个步骤。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**1、计算线性函数的输出值**\n",
    "\n",
    "根据输入对象x和已知的回归系数矩阵w，以及偏量b，计算输入经过线性函数后的输出值，即$h=wx+b$。输出值h也是n维的。每一维可以近似认为是待测对象偏向于每种分类的成分。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**2、将线性函数的输出结果转化为分类概率**\n",
    "\n",
    "得到了线性函数的输出结果，还需要将每一个维度转化为对象属于每个分类的概率。使得每个概率在0-1之间。这是一步非线性操作，所以这一步不再属于线性回归的范围。\n",
    "\n",
    "将输出结果转化为概率的函数我们称为阶跃函数，也叫做激活函数。\n",
    "\n",
    "一般在神经网络中用的最多的是sigmoid和tanh，当然也有用relu的。这是针对“是”和“否”的分类，但当进行多分类时，就要用到softmax 。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "如果使用sigmoid函数进行二分类，sigmoid函数的功能在于：\n",
    "\n",
    "> $sigmoid(x)=\\frac{1}{1+e^{-x}} $\n",
    "\n",
    "1、将线性函数的输出向量作为输入的“分数”，将“分数”映射在（0,1）之间。\n",
    "\n",
    "2、以“对数”的方式完成到（0,1）的映射，凸显大的分数的作用，使其输出的概率更高，抑制小分数的输出概率。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "如果使用softmax函数进行多分类，softmax函数的功能在于：\n",
    "\n",
    "**$ softmax(x) = \\frac{e^{x_i}}{\\sum_i e^{x_i}}$**\n",
    "\n",
    "1、将线性函数的输出向量作为输入的“分数”，将“分数”映射在（0,1）之间，并且所有分数的和为1。\n",
    "\n",
    "2、以“对数”的方式完成到（0,1）的映射，凸显其中最大的分数的作用，并抑制远低于最大分数的其他数值。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们使用代码来看看softmax的效果。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# softmax函数\n",
    "def softmax(s):\n",
    "    return np.exp(s) / np.sum(np.exp(s), axis=0)\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    # 查看softmax效果\n",
    "    import numpy as np;\n",
    "    import matplotlib.pyplot as plt;\n",
    "    x = np.arange(-3.0, 6.0, 0.1)  # 等差数列\n",
    "    scores = np.vstack([x, np.ones_like(x), 0.2 * np.ones_like(x)])  # vstack将三个矩阵按列堆叠在一起。ones_like全1矩阵。模拟三种线性函数的输出向量作为softmax的输入\n",
    "    plt.plot(x, softmax(scores).T, linewidth=2)  # 计算三种softmax函数的输出结果，绘制查看图形，了解softmax的特性:凸显其中最大的分数,抑制远低于最大分数的其他数值\n",
    "    plt.show()\n",
    "\n",
    "    # 分数扩大、缩小100倍\n",
    "    scores = np.array([2.0, 1.0, 0.1])\n",
    "    print(softmax(scores))\n",
    "    print(softmax(scores * 100))\n",
    "    print(softmax(scores / 100))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/4971af6b641a38d5f271293cfedd55a5.png)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "输出：\n",
    "```\n",
    "[ 0.65900114  0.24243297  0.09856589]\n",
    "[  1.00000000e+00   3.72007598e-44   3.04823495e-83]\n",
    "[ 0.33656104  0.33321221  0.33022675]\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "根据输出结果我们可以看出，当softmax的输入值被放大100倍时，数值之间的差距被拉大。\n",
    "\n",
    "softmax对这种差距敏感，差距越大，分类器越‘自信’，差距越小，分类器越‘犹豫’。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**3、将分类概率转化为类别**"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "这一步就比较简单了，当属于某分类的概率最大时，就可以判定为数据该分类。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "> 那如何求解正确的回归系数w和偏量b呢，因为只有有了正确的w和b，才能带入逻辑分类公式进行分类判别。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "(后面的教程就省略b偏量，因为b偏量可以认为是样本一个值为1 的属性对应的权重)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 求解回归系数"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "在逻辑分类器的结构中，我们知道逻辑分类包含了线性函数和激活函数两个过程。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "线性函数$g$部分我们表示为：\n",
    "\n",
    "$g(X)=XW$\n",
    "\n",
    "逻辑回归的激活函数$f$部分我们表示为：\n",
    "\n",
    "$f(Z)=sigmoid(Z)$\n",
    "sigmoid函数的导数为$f'(Z)=f(Z)(1-f(Z))$\n",
    "\n",
    "softmax回归的激活函数$f$部分表示为：\n",
    "\n",
    "$f(Z)=softmax(Z)$\n",
    "\n",
    "当然我们也可以直接将输入x表示为最终的输出结果\n",
    "\n",
    "$Y=h(X)=f(Z)=f(g(X))=f(XW)$\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "----------\n",
    "\n",
    "\n",
    "在线性回归中，我们学习了线性回归的梯度下降法和随机梯度下降法，在逻辑回归、甚至以后的神经网络、全连接神经网络、卷积神经网络、循环神经网络都是使用梯度下降法（随机梯度下降法）来进行后向传播算法实现回归系数w或网络层系数w的迭代收敛计算。\n",
    "\n",
    "梯度下降法的步骤（线性回归、逻辑回归、所有的神经网络都是相同的步骤）：\n",
    "\n",
    "1、写出损失函数。\n",
    "\n",
    "2、损失函数对自变量w求导，得到w的梯度。\n",
    "\n",
    "3、根据梯度，更新w权重。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 梯度下降法"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "现在我们先来回归一下线性回归中的梯度下降法。\n",
    "\n",
    "梯度下降法求解的是使损失函数（误差平方和）最小的权重矩阵w。\n",
    "\n",
    "误差平方和为\n",
    "\n",
    "$J(w)=\\sum_{i=1}^m(x_iw-y_i)^2$\n",
    "\n",
    "对w求导的得到梯度\n",
    "\n",
    "$\\nabla J(w)=2X^T(Xw-y)$\n",
    "\n",
    "回归系数更新\n",
    "\n",
    "$w_{k+1}=w_k-\\rho * \\nabla J(w_k)=w_k-\\rho * 2X^T(Xw_k-y)$\n",
    "\n",
    "----------\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**我们来按照这个步骤解决逻辑回归中的梯度下降。**\n",
    "\n",
    "\n",
    "一般定义逻辑回归的损失函数为\n",
    "\n",
    "$J(w)=-\\frac{1}{m} \\sum_{i=1}^m[y_ilog{h_w(x_i)}+(1-y_i)log(1-h_w(x_i))]$\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "其中\n",
    "\n",
    "$x(i) $\t每个样本数据点在某一个特征上的值，即样本对象x的某个属性的值\n",
    "\n",
    "$y(i)\t$ 每个样本数据的所属类别标签\n",
    "\n",
    "$m$\t样本数据点的个数\n",
    "\n",
    "$h_w(x)$\t样本数据的概率密度函数，即某个数据属于1类（二分类问题）的概率\n",
    "\n",
    "$J(w)$\t代价函数，估计样本属于某类的风险程度，越小代表越有可能属于这类\n",
    "\n",
    "对损失函数求导得到w的梯度为\n",
    "\n",
    "$\\nabla J(w)=\\sum_{i=1}^m x_i^T *(f(x_iw)-y_i)=   X^T(f(Xw)-y) = X^T(sigmoid(Xw)-y)$\n",
    "\n",
    "损失函数的求梯度，你可以就按正常的函数求导来推导。\n",
    "\n",
    "其中$x_i$表示第i个样本对象（行向量），$y_i$表示第i个输出结果（行向量），$f$为激活函数，X为m个样本数据集，每行为一个对象，y为m个输出结果。\n",
    "\n",
    ">**再强调一遍，在线性回归中我们一般使用的案例都是预测一个数值型数据，在逻辑回归中代表分类的数字并不是数值型的，所以经过one-hot编码后变成了多个数值型的输出结果。**\n",
    "\n",
    "所以在逻辑回归中，梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho  \\nabla J(w_k)=w_k-\\rho  X^T(sigmoid(Xw_k)-y)$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "----------\n",
    "\n",
    "\n",
    "在有些教程中，喜欢用列向量表示一个样本对象x，则一个对象的输出结果y也是列向量。\n",
    "\n",
    "你看到的逻辑回归的梯度写法不相同，不过不用慌张，它只是数据的行列形式不同罢了。\n",
    "\n",
    "**我们来按照这个步骤解决softmax回归中的梯度下降。**\n",
    "\n",
    "喜欢公式的可以移步https://www.cnblogs.com/Determined22/p/6362951.html\n",
    "\n",
    "我们只对softmax回归的结果给出。\n",
    "\n",
    "在softmax回归中，梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho  \\nabla J(w_k)=w_k-\\rho  X^T(softmax(Xw_k)-y)$\n",
    "\n",
    "\n",
    "\n",
    "> 所以说线性回归、逻辑回归、softmax回归的梯度更新公式都是一样的。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 随机梯度下降法\n",
    "\n",
    "\n",
    "前面的梯度下降法在每次更新回归系数(最优参数)时，都需要遍历整个数据集。如果有数十亿样本和成千上万的特征，那么该方法的计算复杂度就太高了。\n",
    "\n",
    "现在介绍一种算法，一次只用一个样本点去更新回归系数(最优参数)，这样就可以有效减少计算量了，这种方法就叫做随机梯度上升算法。\n",
    "\n",
    "在线性回归中回归系数的更新如下，其中$x_k$为第k个样本对象（行向量）。$y_k$为该对象输出的结果，为数值。\n",
    "\n",
    "$w_{k+1}=w_k-2*\\rho_kx_k^T(x_kw_k-y_k)$\n",
    "\n",
    "在逻辑回归中，每个对象的输出结果（类别）被one-hot编码成向量，随机梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho_kx_k^T(sigmoid(x_kw_k)-y_k)$\n",
    "\n",
    "其中$x_k$为一个样本对象（行向量），$y_k$为该对象的输出（行向量）。\n",
    "\n",
    "在softmax回归中，随机梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho_kx_k^T(softmax(x_kw_k)-y_k)$\n",
    "\n",
    "喜欢数学推导的朋友可以自己去百度，这里就不给出了。\n",
    "\n",
    "\n",
    "----------"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 二分类逻辑回归案例 ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 构造数据集 ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "每行为一个对象，每列为一个特征属性，第1列为x1，第2列为x2，最后一列为所属分类。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "```\n",
    "data=[\n",
    "    [-0.017612,14.053064,0],\n",
    "\t[-0.752157,6.538620,0],\n",
    "\t[-1.322371,7.152853,0],\n",
    "\t[0.423363,11.054677,0],\n",
    "\t[0.569411,9.548755,0],\n",
    "\t[-0.026632,10.427743,0],\n",
    "\t[0.667394,12.741452,0],\n",
    "\t[1.347183,13.175500,0],\n",
    "\t[1.217916,9.597015,0],\n",
    "\t[-0.733928,9.098687,0],\n",
    "\t[1.416614,9.619232,0],\n",
    "\t[1.388610,9.341997,0],\n",
    "\t[0.317029,14.739025,0],\n",
    "\t[-0.576525,11.778922,0],\n",
    "\t[-1.781871,9.097953,0],\n",
    "\t[-1.395634,4.662541,1],\n",
    "\t[0.406704,7.067335,1],\n",
    "\t[-2.460150,6.866805,1],\n",
    "\t[0.850433,6.920334,1],\n",
    "\t[1.176813,3.167020,1],\n",
    "\t[-0.566606,5.749003,1],\n",
    "\t[0.931635,1.589505,1],\n",
    "\t[-0.024205,6.151823,1],\n",
    "\t[-0.036453,2.690988,1],\n",
    "\t[-0.196949,0.444165,1],\n",
    "\t[1.014459,5.754399,1],\n",
    "\t[1.985298,3.230619,1],\n",
    "\t[-1.693453,-0.557540,1],\n",
    "\t[-0.346811,-1.678730,1],\n",
    "\t[-2.124484,2.672471,1]\n",
    "]\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "data=[\n",
    "    [-0.017612,14.053064,0],\n",
    "\t[-0.752157,6.538620,0],\n",
    "\t[-1.322371,7.152853,0],\n",
    "\t[0.423363,11.054677,0],\n",
    "\t[0.569411,9.548755,0],\n",
    "\t[-0.026632,10.427743,0],\n",
    "\t[0.667394,12.741452,0],\n",
    "\t[1.347183,13.175500,0],\n",
    "\t[1.217916,9.597015,0],\n",
    "\t[-0.733928,9.098687,0],\n",
    "\t[1.416614,9.619232,0],\n",
    "\t[1.388610,9.341997,0],\n",
    "\t[0.317029,14.739025,0],\n",
    "\t[-0.576525,11.778922,0],\n",
    "\t[-1.781871,9.097953,0],\n",
    "\t[-1.395634,4.662541,1],\n",
    "\t[0.406704,7.067335,1],\n",
    "\t[-2.460150,6.866805,1],\n",
    "\t[0.850433,6.920334,1],\n",
    "\t[1.176813,3.167020,1],\n",
    "\t[-0.566606,5.749003,1],\n",
    "\t[0.931635,1.589505,1],\n",
    "\t[-0.024205,6.151823,1],\n",
    "\t[-0.036453,2.690988,1],\n",
    "\t[-0.196949,0.444165,1],\n",
    "\t[1.014459,5.754399,1],\n",
    "\t[1.985298,3.230619,1],\n",
    "\t[-1.693453,-0.557540,1],\n",
    "\t[-0.346811,-1.678730,1],\n",
    "\t[-2.124484,2.672471,1]\n",
    "]"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "先将数据集转化为逻辑分类可以处理的数据结构。即，为对象添加值为1的属性x0，将输出分类转换为one-hot编码。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "#加载数据集，最后一列最为类别标签，前面的为特征属性的值\n",
    "def loadDataSet(datasrc):\n",
    "    dataMat = np.mat(datasrc)\n",
    "    y = dataMat[:, dataMat.shape[1] - 1]  # 最后一列为结果列\n",
    "    b = np.ones(y.shape)  # 添加全1列向量代表b偏量\n",
    "    X = np.column_stack((b, dataMat[:, 0:dataMat.shape[1] - 1]))  # 特征属性集和b偏量组成x\n",
    "    X = np.mat(X)\n",
    "    labeltype = np.unique(y.tolist())  # 获取分类数目\n",
    "    eyes = np.eye(len(labeltype))  # 每一类用单位矩阵中对应的行代替，表示目标概率。如分类0的概率[1,0,0]，分类1的概率[0,1,0]，分类2的概率[0,0,1]\n",
    "    Y = np.zeros((X.shape[0], len(labeltype)))\n",
    "    for i in range(X.shape[0]):\n",
    "        Y[i, :] = eyes[int(y[i, 0])]  # 读取分类，替换成概率向量。这就要求分类为0,1,2,3,4,5这样的整数\n",
    "    # print(Y)\n",
    "    return X, y, Y  # X为特征数据集，y为分类数据集，Y为概率集"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们先来看看数据是个什么样子"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "#可视化样本数据集\n",
    "def plotDataSet():\n",
    "    dataMat,labelMat,labelPMat = loadDataSet(data)                        #加载数据集\n",
    "    plt.scatter(dataMat[:,1].flatten().A[0],dataMat[:,2].flatten().A[0],c=labelMat.flatten().A[0])                   #第一个偏量为b，第2个偏量x1，第3个偏量x2\n",
    "    plt.xlabel('X1'); plt.ylabel('X2')                                 #绘制label\n",
    "    plt.xlim([-3,3])\n",
    "    plt.ylim([-4,16])\n",
    "    plt.show()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/69c8b1c9caaf439ae40836bab4ab6f73.png)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们使用的样本数据，有两个特征属性，x1和x2如果，算上b偏量就是三个回归系数。结果为2分类。\n",
    "\n",
    "则线性函数的输出结果为[z0,z1]\n",
    "\n",
    "z0=w00*1 + w01*x1 + w02*x2\n",
    "z1=w10*1 + w11*x1 + w12*x2\n",
    "\n",
    "线性函数的结果[z0,z1]再经过Sigmoid激活函数，可得到预测的分类概率[y0,y1]。\n",
    "\n",
    "当然最开始这个预测的分类概率并不准确，所以需要模型训练，不停的根据预测的结果是否准确来调整回归系数。\n",
    "\n",
    "其中Sigmoid逻辑回归函数实现如下：\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# sigmoid函数，逻辑回归函数，将线性回归值转化为概率的激活函数\n",
    "def sigmoid(x):\n",
    "    return 1.0 / (1 + np.exp(-x))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 梯度下降法获取回归系数： ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "在逻辑回归中，梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho  \\nabla J(w_k)=w_k-\\rho  X^T(sigmoid(Xw_k)-y)$\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "按照这个公式实现代码"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 逻辑回归中使用梯度下降法求回归系数。逻辑回归和线性回归中原理相同，只不过逻辑回归使用sigmoid作为迭代进化函数。因为逻辑回归是为了分类而生。线性回归为了预测而生\n",
    "def gradAscent(dataMat, labelPMat):\n",
    "    m, n = np.shape(dataMat)                                            #返回dataMatrix的大小。m为行数,n为列数。\n",
    "    alpha = 0.05                                                        #移动步长,也就是学习速率,控制更新的幅度。\n",
    "    maxCycles = 1000                                                      #最大迭代次数\n",
    "    weights = np.ones((n,labelPMat.shape[1]))                                             #初始化权重列向量\n",
    "    for k in range(maxCycles):\n",
    "        h =  sigmoid(dataMat * weights)                                #梯度上升矢量化公式，计算预测值（列向量）。每一个样本产生一个预测值\n",
    "        error = h-labelPMat                                            #计算每一个样本预测值误差\n",
    "        weights = weights - alpha * dataMat.T * error                   # 根据所有的样本产生的误差调整回归系数\n",
    "    return weights.getA()                                               # 将矩阵转换为数组，返回回归系数数组"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "绘制分类区域"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 分类只能绘制分界区域。而不是通过分割线来可视化\n",
    "def plotBestFit(dataMat,labelMat,weights):\n",
    "\n",
    "    # 先产生x1和x2取值范围上的网格点，并预测每个网格点上的值。\n",
    "    step = 0.02\n",
    "    x1_min, x1_max = -3,3\n",
    "    x2_min, x2_max = -4,16\n",
    "    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, step), np.arange(x2_min, x2_max, step))\n",
    "    testMat = np.c_[xx1.ravel(), xx2.ravel()]   #形成测试特征数据集\n",
    "    testMat = np.column_stack((np.ones(((testMat.shape[0]),1)),testMat))  #添加第一列为全1代表b偏量\n",
    "    testMat = np.mat(testMat)\n",
    "    y = sigmoid(testMat*weights)   #输出每个样本属于每个分类的概率\n",
    "    predicted = y.argmax(axis=1)                            #获取每行最大值的位置，位置索引就是分类\n",
    "    predicted = predicted.reshape(xx1.shape).getA()\n",
    "    # 绘制区域网格图\n",
    "    plt.pcolormesh(xx1, xx2, predicted, cmap=plt.cm.Paired)\n",
    "\n",
    "    # 再绘制一遍样本数据点，这样方便查看\n",
    "    plt.scatter(dataMat[:, 1].flatten().A[0], dataMat[:, 2].flatten().A[0],\n",
    "                c=labelMat.flatten().A[0], alpha=.5)  # 第一个偏量为b，第2个偏量x1，第3个偏量x2\n",
    "    plt.xlim([-3,3])\n",
    "    plt.ylim([-4,16])\n",
    "    plt.show()\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    import numpy as np;\n",
    "    import matplotlib.pyplot as plt ;\n",
    "    dataMat, labelMat, labelPMat = loadDataSet(data)   # 加载数据集\n",
    "    weights = gradAscent(dataMat, labelPMat)         # 梯度下降法求回归系数\n",
    "    print(weights)\n",
    "\n",
    "    plotBestFit(dataMat,labelMat,weights)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "回归系数矩阵w为：\n",
    "\n",
    "[[-25.38603398  25.43723638]\n",
    " [ -3.98349475   3.99351765]\n",
    " [  3.60847635  -3.61579463]]\n",
    "\n",
    "绘制出的分割区域为\n",
    "\n",
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/489b27990fbc4b75d75a27d59255ac52.png)\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 随机梯度下降法获取回归系数： ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "在逻辑回归中，随机梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho_kx_k^T(sigmoid(x_kw_k)-y_k)$\n",
    "\n",
    "其中$x_k$为一个样本对象（行向量），$y_k$为该对象的输出（行向量）。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 逻辑回归中使用随机梯度下降算法。numIter为迭代次数。改进之处：alpha移动步长是变化的。一次只用一个样本点去更新回归系数，这样就可以有效减少计算量了\n",
    "def stocGradAscent(dataMatrix, labelPMat, numIter=500):\n",
    "    m,n = np.shape(dataMatrix)                                                #返回dataMatrix的大小。m为样本对象的数目,n为列数。\n",
    "    weights = np.ones((n,labelPMat.shape[1]))                                                  #参数初始化\n",
    "    for j in range(numIter):\n",
    "        for k in range(m):                                                    # 遍历m个样本对象\n",
    "            alpha = 10/(1.0+j+k)+0.01                                          #降低alpha的大小，每次减小1/(j+i)。刚开始的时候可以步长大一点，后面调整越精细\n",
    "            h = sigmoid(dataMatrix[k]*weights)                        #选择随机选取的一个样本，计算预测值h\n",
    "            error = h-labelPMat[k]                              #计算一个样本的误差\n",
    "            weights = weights - alpha * dataMatrix[k].T*error         #更新回归系数\n",
    "    return weights.getA()                                                           #将矩阵转换为数组，返回回归系数数组"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "获取回归系数矩阵，绘制分割区域"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "if __name__ == '__main__':\n",
    "    dataMat, labelMat, labelPMat = loadDataSet(data)   # 加载数据集\n",
    "    weights = stocGradAscent(dataMat, labelPMat)    # 局部梯度下降法求回归系数\n",
    "    print(weights)\n",
    "    plotBestFit(dataMat,labelMat,weights)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "回归系数矩阵为：\n",
    "\n",
    "[[-23.98429667  27.70302552]\n",
    " [ -3.8623706    4.56685091]\n",
    " [  3.36581925  -3.90760748]]\n",
    "\n",
    "\n",
    "绘制出的分割区域为\n",
    "\n",
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/7629975ef4706cca2b7fdf7d84101fbd.png)\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 对新对象进行预测"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "有了回归系数矩阵，就可以对逻辑回归问题进行分类了。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 对新对象进行预测\n",
    "def predict(weights,testdata):\n",
    "    testdata.insert(0, 1.0)       #现在首部添加1代表b偏量\n",
    "    testMat = np.mat([testdata])\n",
    "    y=sigmoid(testMat * np.mat(weights))  # 对输入进行预测\n",
    "    type = y.argmax(axis=1)  # 概率最大的分类就是预测分类。由于输出值y为行向量，所按行取最大值的位置\n",
    "    return type, y  # type为所属分类，h为属于每种分类的概率\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    dataMat, labelMat, labelPMat = loadDataSet(data)   # 加载数据集\n",
    "    weights = gradAscent(dataMat, labelPMat)         # 梯度下降法求回归系数\n",
    "    # weights = gradAscent(dataMat, labelPMat)         # 梯度下降法求回归系数\n",
    "    print(weights)\n",
    "    type,h = predict(weights,[0.317029,14.739025])\n",
    "    print(type,h)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**输出分类和概率为**\n",
    "\n",
    "[[0]] [[  1.00000000e+00   4.43767950e-13]]\n",
    "\n",
    "表示输入对象[0.317029,14.739025]，属于分类0，因为属于分类0 的概率为1.0，属于分类1的概率为4.43767950e-13。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 多分类softmax回归 ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 加载数据集 ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "数据集就采用如下的数据，和二分类的区别在于类别有三种：0，1，2。\n",
    "\n",
    "样本数据集第一列为属性x1，第二列为属性x2，第三列为分类（三种类别）"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 样本数据集，第一列为x1，第二列为x2，第三列为分类（三种类别）\n",
    "data=[\n",
    "        [-2.68420713, 0.32660731, 0],[-2.71539062, -0.16955685, 0],[-2.88981954, -0.13734561, 0],[-2.7464372, -0.31112432, 0],[-2.72859298, 0.33392456, 0],\n",
    "        [-2.27989736, 0.74778271, 0],[-2.82089068, -0.08210451, 0],[-2.62648199, 0.17040535, 0],[-2.88795857, -0.57079803, 0],[-2.67384469, -0.1066917, 0],\n",
    "        [-2.50652679,0.65193501,0],[-2.61314272,0.02152063,0],[-2.78743398,-0.22774019,0],[-3.22520045,-0.50327991,0],[-2.64354322,1.1861949,0],\n",
    "        [-2.38386932,1.34475434,0],[-2.6225262,0.81808967,0],[-2.64832273,0.31913667,0],[-2.19907796,0.87924409,0],[-2.58734619,0.52047364,0],\n",
    "        [1.28479459, 0.68543919, 1],[0.93241075, 0.31919809, 1],[1.46406132, 0.50418983, 1],[0.18096721, -0.82560394, 1],[1.08713449, 0.07539039, 1],\n",
    "        [0.64043675, -0.41732348, 1],[1.09522371, 0.28389121, 1],[-0.75146714, -1.00110751, 1],[1.04329778, 0.22895691, 1],[-0.01019007, -0.72057487, 1],\n",
    "        [-0.5110862,-1.26249195,1],[0.51109806,-0.10228411,1],[0.26233576,-0.5478933,1],[0.98404455,-0.12436042,1],[-0.174864,-0.25181557,1],\n",
    "        [0.92757294,0.46823621,1],[0.65959279,-0.35197629,1],[0.23454059,-0.33192183,1],[0.94236171,-0.54182226,1],[0.0432464,-0.58148945,1],\n",
    "        [2.53172698, -0.01184224, 2],[1.41407223, -0.57492506, 2],[2.61648461, 0.34193529, 2],[1.97081495, -0.18112569, 2],[2.34975798, -0.04188255, 2],\n",
    "        [3.39687992, 0.54716805, 2],[0.51938325, -1.19135169, 2],[2.9320051, 0.35237701, 2],[2.31967279, -0.24554817, 2],[2.91813423, 0.78038063, 2],\n",
    "        [1.66193495,0.2420384,2],[1.80234045,-0.21615461,2],[2.16537886,0.21528028,2],[1.34459422,-0.77641543,2],[1.5852673,-0.53930705,2],\n",
    "        [1.90474358,0.11881899,2],[1.94924878,0.04073026,2],[3.48876538,1.17154454,2],[3.79468686,0.25326557,2],[1.29832982,-0.76101394,2],\n",
    "]"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "先将数据集转化为逻辑分类可以处理的数据结构。即，为对象添加值为1的属性x0，将输出分类转换为one-hot编码。\n",
    "\n",
    "分类0表示为[1,0,0]，分类1表示为[0,1,0]，分类2表示为[0,0,1]"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 加载数据集，最后一列最为类别标签，前面的为特征属性的值\n",
    "def loadDataSet(dataarr):\n",
    "    # 生成X和y矩阵\n",
    "    dataMat = np.mat(dataarr)\n",
    "    y = dataMat[:, dataMat.shape[1] - 1]  # 最后一列为结果列\n",
    "    b = np.ones(y.shape)  # 添加全1列向量代表b偏量\n",
    "    X = np.column_stack((b, dataMat[:, 0:dataMat.shape[1] - 1]))  # 特征属性集和b偏量组成x\n",
    "    X = np.mat(X)\n",
    "    labeltype = np.unique(y.tolist())       # 获取分类数目\n",
    "    eyes = np.eye(len(labeltype))    # 每一类用单位矩阵中对应的行代替，表示目标概率。如分类0的概率[1,0,0]，分类1的概率[0,1,0]，分类2的概率[0,0,1]\n",
    "    Y=np.zeros((X.shape[0],len(labeltype)))\n",
    "    for i in range(X.shape[0]):\n",
    "        Y[i,:] = eyes[int(y[i,0])]               # 读取分类，替换成概率向量。这就要求分类为0,1,2,3,4,5这样的整数\n",
    "    return X, y,Y       #X为特征数据集，y为分类数据集，Y为概率集"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "下面我们先来看看数据是什么样的。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "#可视化样本数据集\n",
    "def plotDataSet():\n",
    "    dataMat,labelMat,labelPMat = loadDataSet()                        #加载数据集\n",
    "    plt.scatter(dataMat[:,1].flatten().A[0],dataMat[:,2].flatten().A[0],c=labelMat.flatten().A[0])                   #第一个偏量为b，第2个偏量x1，第3个偏量x2\n",
    "    plt.xlabel('X1'); plt.ylabel('X2')                                 #绘制label\n",
    "    plt.show()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/1d9ea5def2330138c4110d824c81b262.png)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## softmax回归的梯度下降法 ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "在softmax回归中，梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho  \\nabla J(w_k)=w_k-\\rho  X^T(softmax(Xw_k)-y)$\n",
    "\n",
    "按照这个公式实现代码"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# softmax函数，将线性回归值转化为概率的激活函数。输入s要是行向量\n",
    "def softmax(s):\n",
    "    return np.exp(s) / np.sum(np.exp(s), axis=1)\n",
    "\n",
    "# 逻辑回归中使用梯度下降法求回归系数。逻辑回归和线性回归中原理相同，只不过逻辑回归使用sigmoid作为迭代进化函数。因为逻辑回归是为了分类而生。线性回归为了预测而生\n",
    "def gradAscent(dataMat, labelPMat):\n",
    "    alpha = 0.2                                                        #移动步长,也就是学习速率,控制更新的幅度。\n",
    "    maxCycles = 1000                                                      #最大迭代次数\n",
    "    weights = np.ones((dataMat.shape[1],labelPMat.shape[1]))             #初始化权回归系数矩阵  系数矩阵的行数为特征矩阵的列数，系数矩阵的列数为分类数目\n",
    "    for k in range(maxCycles):\n",
    "        h =  softmax(dataMat*weights)                                #梯度上升矢量化公式，计算预测值（行向量）。每一个样本产生一个概率行向量\n",
    "        error = h-labelPMat                                            #计算每一个样本预测值误差\n",
    "        weights = weights - alpha * dataMat.T * error                   # 根据所有的样本产生的误差调整回归系数\n",
    "    return weights                                  "
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "绘制分类区域"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 多分类只能绘制分界区域。而不是通过分割线来可视化\n",
    "def plotBestFit(dataMat,labelMat,weights):\n",
    "\n",
    "    # 获取数据边界值，也就属性的取值范围。\n",
    "    x1_min, x1_max = dataMat[:, 1].min() - .5, dataMat[:, 1].max() + .5\n",
    "    x2_min, x2_max = dataMat[:, 2].min() - .5, dataMat[:, 2].max() + .5\n",
    "    # 产生x1和x2取值范围上的网格点，并预测每个网格点上的值。\n",
    "    step = 0.02\n",
    "    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, step), np.arange(x2_min, x2_max, step))\n",
    "    testMat = np.c_[xx1.ravel(), xx2.ravel()]   #形成测试特征数据集\n",
    "    testMat = np.column_stack((np.ones(((testMat.shape[0]),1)),testMat))  #添加第一列为全1代表b偏量\n",
    "    testMat = np.mat(testMat)\n",
    "    # 预测网格点上的值\n",
    "    y = softmax(testMat*weights)   #输出每个样本属于每个分类的概率\n",
    "    # 判断所属的分类\n",
    "    predicted = y.argmax(axis=1)                            #获取每行最大值的位置，位置索引就是分类\n",
    "    predicted = predicted.reshape(xx1.shape).getA()\n",
    "    # 绘制区域网格图\n",
    "    plt.pcolormesh(xx1, xx2, predicted, cmap=plt.cm.Paired)\n",
    "\n",
    "    # 再绘制一遍样本点，方便对比查看\n",
    "    plt.scatter(dataMat[:, 1].flatten().A[0], dataMat[:, 2].flatten().A[0],\n",
    "                c=labelMat.flatten().A[0],alpha=.5)  # 第一个偏量为b，第2个偏量x1，第3个偏量x2\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "\n",
    "    dataMat, labelMat,labelPMat = loadDataSet(data)  # 加载数据集\n",
    "    # 梯度下降法\n",
    "    weights = gradAscent(dataMat, labelPMat)         # 梯度下降法求回归系数\n",
    "    print(weights)\n",
    "    plotBestFit(dataMat,labelMat,weights)\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**权重矩阵w为**\n",
    "\n",
    "[[ -0.76478209  12.94845658  -9.18367448]\n",
    "\n",
    " [-10.22934766  -1.38818119  14.61752885]\n",
    " \n",
    " [  5.38500348   5.07899946  -7.46400295]]\n",
    "\n",
    "**绘制的分割区域为**\n",
    "\n",
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/b61c7de255e80866ccff4531e9c110c8.png)\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## softmax回归的随机梯度下降法 ##"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "在softmax回归中，随机梯度下降法的更新公式为\n",
    "\n",
    "$w_{k+1}=w_k-\\rho_kx_k^T(softmax(x_kw_k)-y_k)$"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "按照这个公式实现代码"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 逻辑回归中使用随机梯度下降算法。numIter为迭代次数。改进之处：alpha移动步长是变化的。一次只用一个样本点去更新回归系数，这样就可以有效减少计算量了\n",
    "def stocGradAscent(dataMatrix, labelPMat, numIter=500):\n",
    "    m,n = np.shape(dataMatrix)                                                #返回dataMatrix的大小。m为样本对象的数目,n为列数。\n",
    "    weights = np.ones((n,labelPMat.shape[1]))                                                  #参数初始化\n",
    "    for j in range(numIter):\n",
    "        for k in range(m):                                                    # 遍历m个样本对象\n",
    "            alpha = 2/(1.0+j+k)+0.01                                          #降低alpha的大小，每次减小1/(j+i)。刚开始的时候可以步长大一点，后面调整越精细\n",
    "            h = softmax(dataMatrix[k]*weights)                        #选择随机选取的一个样本，计算预测值h\n",
    "            error = h-labelPMat[k]                              #计算一个样本的误差\n",
    "            weights = weights - alpha * dataMatrix[k].T*error         #更新回归系数\n",
    "    return weights.getA()                                                           #将矩阵转换为数组，返回回归系数数组"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "绘制分割区域"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "if __name__ == '__main__':\n",
    "\n",
    "    dataMat, labelMat,labelPMat = loadDataSet(data)  # 加载数据集\n",
    "    # 梯度下降法\n",
    "    weights = stocGradAscent(dataMat, labelPMat)         # 梯度下降法求回归系数\n",
    "    print(weights)\n",
    "    plotBestFit(dataMat,labelMat,weights)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**得到的权重矩阵w为**\n",
    "\n",
    "[[ -0.86015701   4.56667081  -4.7265138 ]\n",
    "\n",
    " [ -0.4547426    3.74717339  10.49808187]\n",
    " \n",
    " [  3.26510843   1.9243891   -3.50245892]]\n",
    "\n",
    "**分割区域为**\n",
    "\n",
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/0d8aeed4f07c29dd116f28942963e95e.png)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 对新对象进行预测 ##\n",
    "\n",
    "有了回归系数矩阵，就可以对softmax回归问题进行多分类了。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 对新对象进行预测\n",
    "def predict(weights,testdata):\n",
    "    testdata.insert(0, 1.0)    #现在首部添加1代表b偏量\n",
    "    testMat = np.mat([testdata])\n",
    "    y=softmax(testMat * np.mat(weights))\n",
    "    type = y.argmax(axis=1)   # 概率最大的分类就是预测分类。由于输出值y为行向量，所按行取最大值的位置\n",
    "    return type,y   #type为所属分类，h为属于每种分类的概率\n",
    "\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    dataMat, labelMat, labelPMat = loadDataSet(data)   # 加载数据集\n",
    "    weights = gradAscent(dataMat, labelPMat)         # 梯度下降法求回归系数\n",
    "    # weights = gradAscent(dataMat, labelPMat)         # 梯度下降法求回归系数\n",
    "    print(weights)\n",
    "    type,h = predict(weights,[1.90474358,0.11881899])\n",
    "    print(type,h)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**输出分类和概率为**\n",
    "\n",
    "[[2]] [[  8.81398647e-08   5.11371774e-02   9.48862734e-01]]\n",
    "\n",
    "表示输入对象[1.90474358,0.11881899]，属于分类2，因为属于分类0 的概率为8.81398647e-08，属于分类1的概率为5.11371774e-02，属于分类2的概率为 9.48862734e-01。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 其他方式的编码"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "在逻辑回归中，二分类0和1一般要按one-hot编码成[1,0]和[0,1]。\n",
    "\n",
    "我们知道[1,0]表示的是样本对象100%的概率为分类0，0%的概率为分类1\n",
    "\n",
    "[0,1]表示的是样本对象0%的概率为分类0，100%的概率为分类1\n",
    "\n",
    "---\n",
    "\n",
    "那我们现在来尝试一下其他的概率表达方式：\n",
    "\n",
    "比如二分类中，分类0是我想要的，分类1不是我想要的。\n",
    "\n",
    "分类0的值表达成[100%]，表示100%是我想要的，\n",
    "\n",
    "分类1的值表达成[0%]，表示0%是我想要的。\n",
    "\n",
    "我们按照逻辑回归计算出来的概率值就是y只有一个值。这个值表达的是，是我想要的分类的概率，也就是该对象是分类0的概率。这个概率超过50%，我们就可以把他划分为分类0，低于50%，我们就可以划分为分类1。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "由于我们使用这种方式进行编码，只计算一个输出值，\n",
    "所以要计算的回归系数矩阵，就变成了\n",
    "\n",
    "$$\n",
    "        \\begin{matrix}\n",
    "        w0   \\\\\n",
    "        w1  \\\\\n",
    "        w2 \\\\\n",
    "        \\end{matrix}\n",
    "$$\n",
    "\n",
    "而不再是\n",
    "\n",
    "$$\n",
    "        \\begin{matrix}\n",
    "        w00 & w01  \\\\\n",
    "        w10 & w11  \\\\\n",
    "        w20 & w21  \\\\\n",
    "        \\end{matrix}\n",
    "$$\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "实现代码为"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import random\n",
    "\n",
    "\n",
    "# 样本数据集，第一列为x1，第二列为x2，第三列为分类（二种类别）,必须用0和1表示分类\n",
    "data=[\n",
    "    [-0.017612, 14.053064, 0],[-0.752157, 6.538620, 0],[-1.322371, 7.152853, 0],[0.423363, 11.054677, 0],[0.569411, 9.548755, 0],\n",
    "    [-0.026632, 10.427743, 0],[0.667394, 12.741452, 0],[1.347183, 13.175500, 0],[1.217916, 9.597015, 0],[-0.733928, 9.098687, 0],\n",
    "    [1.416614, 9.619232, 0],[1.388610, 9.341997, 0],[0.317029, 14.739025, 0],[-0.576525, 11.778922, 0],[-1.781871, 9.097953, 0],\n",
    "    [-1.395634, 4.662541, 1],[0.406704, 7.067335, 1],[-2.460150, 2.866805, 1],[0.850433, 6.920334, 1],[1.176813, 3.167020, 1],\n",
    "    [-0.566606, 5.749003, 1],[0.931635, 1.589505, 1],[-0.024205, 6.151823, 1],[-0.036453, 2.690988, 1],[-0.196949, 0.444165, 1],\n",
    "    [1.014459, 5.754399, 1],[1.985298, 3.230619, 1],[-1.693453, -0.557540, 1],[-0.346811, -1.678730, 1],[-2.124484, 2.672471, 1]\n",
    "]\n",
    "\n",
    "\n",
    "#加载数据集，最后一列最为类别标签，前面的为特征属性的值\n",
    "def loadDataSet(datasrc):\n",
    "    dataMat = np.mat(datasrc)\n",
    "    y = dataMat[:, dataMat.shape[1] - 1]  # 最后一列为结果列\n",
    "    b = np.ones(y.shape)  # 添加全1列向量代表b偏量\n",
    "    X = np.column_stack((b, dataMat[:, 0:dataMat.shape[1] - 1]))  # 特征属性集和b偏量组成x\n",
    "    X = np.mat(X)\n",
    "    for i in range(y.shape[0]):\n",
    "        if y[i]==1:y[i]=0.0             # 分类1不是我想要的分类，所以概率写成0.0\n",
    "        elif y[i] == 0: y[i] = 1.0     # 分类0是我想要的分类，所以概率写成1.0\n",
    "    y=np.mat(y)\n",
    "    return X, y  # X为特征数据集，y为分类数据集，Y为概率集\n",
    "\n",
    "\n",
    "\n",
    "#可视化样本数据集\n",
    "def plotDataSet():\n",
    "    dataMat,labelMat,labelPMat = loadDataSet(data)                        #加载数据集\n",
    "    plt.scatter(dataMat[:,1].flatten().A[0],dataMat[:,2].flatten().A[0],c=labelMat.flatten().A[0])                   #第一个偏量为b，第2个偏量x1，第3个偏量x2\n",
    "    plt.xlabel('X1'); plt.ylabel('X2')                                 #绘制label\n",
    "    plt.xlim([-3,3])\n",
    "    plt.ylim([-4,16])\n",
    "    plt.show()\n",
    "\n",
    "# sigmoid函数，逻辑回归函数，将线性回归值转化为概率的激活函数\n",
    "def sigmoid(x):\n",
    "    return 1.0 / (1 + np.exp(-x))\n",
    "\n",
    "# 逻辑回归中使用梯度下降法求回归系数。逻辑回归和线性回归中原理相同，只不过逻辑回归使用sigmoid作为迭代进化函数。因为逻辑回归是为了分类而生。线性回归为了预测而生\n",
    "def gradAscent(dataMat, labelPMat):\n",
    "    m, n = np.shape(dataMat)                                            #返回dataMatrix的大小。m为行数,n为列数。\n",
    "    alpha = 0.05                                                        #移动步长,也就是学习速率,控制更新的幅度。\n",
    "    maxCycles = 1000                                                      #最大迭代次数\n",
    "    weights = np.ones((n,1))                                             #初始化权重列向量\n",
    "    for k in range(maxCycles):\n",
    "        h =  sigmoid(dataMat * weights)                                #梯度上升矢量化公式，计算预测值（列向量）。每一个样本产生一个预测值\n",
    "        error = h-labelPMat                                            #计算每一个样本预测值误差\n",
    "        weights = weights - alpha * dataMat.T * error                   # 根据所有的样本产生的误差调整回归系数\n",
    "    return weights.getA()                                               # 将矩阵转换为数组，返回回归系数数组\n",
    "\n",
    "# 逻辑回归中使用随机梯度下降算法。numIter为迭代次数。改进之处：alpha移动步长是变化的。一次只用一个样本点去更新回归系数，这样就可以有效减少计算量了\n",
    "def stocGradAscent(dataMatrix, labelMat, numIter=500):\n",
    "    m,n = np.shape(dataMatrix)                                                #返回dataMatrix的大小。m为样本对象的数目,n为列数。\n",
    "    weights = np.ones((n,1))                                                  #参数初始化\n",
    "    for j in range(numIter):\n",
    "        for k in range(m):                                                    # 遍历m个样本对象\n",
    "            alpha = 10/(1.0+j+k)+0.01                                          #降低alpha的大小，每次减小1/(j+i)。刚开始的时候可以步长大一点，后面调整越精细\n",
    "            h = sigmoid(dataMatrix[k]*weights)                        #选择随机选取的一个样本，计算预测值h\n",
    "            error = h-labelMat[k]                              #计算一个样本的误差\n",
    "            weights = weights - alpha * dataMatrix[k].T*error         #更新回归系数\n",
    "    return weights.getA()                                                           #将矩阵转换为数组，返回回归系数数组\n",
    "\n",
    "\n",
    "# 对新对象进行预测\n",
    "def predict1(weights,testdata):\n",
    "    testdata.insert(0, 1.0)       #现在首部添加1代表b偏量\n",
    "    testMat = np.mat([testdata])\n",
    "    z = testMat * np.mat(weights)\n",
    "    y=sigmoid(z)\n",
    "    if y>0.5:\n",
    "    # if z>0:    #h>0.5的判断等价于  z>0\n",
    "        return 1,y\n",
    "    else:\n",
    "        return 0,y\n",
    "\n",
    "\n",
    "\n",
    "# 绘制分界线。\n",
    "def plotBestFit(dataMat,labelMat,weights):\n",
    "    plt.scatter(dataMat[:, 1].flatten().A[0], dataMat[:, 2].flatten().A[0],\n",
    "                c=labelMat.flatten().A[0],alpha=.5)  # 第一个偏量为b，第2个偏量x1，第3个偏量x2\n",
    "\n",
    "    x1 = np.arange(-4.0, 4.0, 0.1)\n",
    "    x2 = (-weights[0] - weights[1] * x1) / weights[2]    # 逻辑回归获取的回归系数，满足w0+w1*x1+w2*x2=0，即x2 =(-w0-w1*x1)/w2\n",
    "    plt.plot(x1, x2)\n",
    "    plt.xlabel('X1'); plt.ylabel('X2')                                    #绘制label\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    # 查看数据集的分布\n",
    "    # plotDataSet()\n",
    "\n",
    "    dataMat, labelMat = loadDataSet(data)   # 加载数据集\n",
    "    weights = gradAscent(dataMat, labelMat)         # 梯度下降法求回归系数\n",
    "    # weights = stocGradAscent(dataMat, labelMat)    # 局部梯度下降法求回归系数\n",
    "    print(weights)\n",
    "    type,y = predict1(weights,[0.317029,14.739025])\n",
    "    print(type,y)\n",
    "    plotBestFit(dataMat,labelMat,weights)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "**运行函数得到的回归系数为**\n",
    "\n",
    "[[-23.98429667]\n",
    "\n",
    " [ -3.8623706 ]\n",
    " \n",
    " [  3.36581925]]\n",
    "\n",
    "这个回归系数矩阵表达式是y=sigmoid(xw)中的w。\n",
    "\n",
    "我们通过y>0.5进行分类判别。由于sigmoid是调单递增的。所以判别也就等价于xw>0。\n",
    "\n",
    "将这个公式展开。\n",
    "\n",
    "$1*w0+x1*w1+x2*w2>0$\n",
    "\n",
    "将训练迭代的系数带入公式，\n",
    "\n",
    "$1*-23.98429667+x1*-3.8623706+x2*3.36581925>0$\n",
    "\n",
    "**下图绘制出了这条边界线。**\n",
    "\n",
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/d1c6255c5fffd835fcdef874f68de957.png)\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "逻辑回归LR的特征为什么要先离散化\n",
    "-----------------\n",
    "在工业界，很少直接将连续值作为特征喂给逻辑回归模型，而是将连续特征离散化为一系列0、1特征交给逻辑回归模型，这样做的优势有以下几点：\n",
    "\n",
    "0、离散特征的增加和减少都很容易，易于模型的快速迭代；\n",
    "\n",
    "1、稀疏向量内积乘法运算速度快，计算结果方便存储，容易扩展；\n",
    "\n",
    "2、离散化后的特征对异常数据有很强的鲁棒性：比如一个特征是年龄>30是1，否则0。如果特征没有离散化，一个异常数据“年龄300岁”会给模型造成很大的干扰；\n",
    "\n",
    "3、逻辑回归属于广义线性模型，表达能力受限；单变量离散化为N个后，每个变量有单独的权重，相当于为模型引入了非线性，能够提升模型表达能力，加大拟合；\n",
    "\n",
    "4、离散化后可以进行特征交叉，由M+N个变量变为M*N个变量，进一步引入非线性，提升表达能力；\n",
    "\n",
    "5、特征离散化后，模型会更稳定，比如如果对用户年龄离散化，20-30作为一个区间，不会因为一个用户年龄长了一岁就变成一个完全不同的人。当然处于区间相邻处的样本会刚好相反，所以怎么划分区间是门学问；\n",
    "\n",
    "6、特征离散化以后，起到了简化了逻辑回归模型的作用，降低了模型过拟合的风险。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "```\n",
    "李沐少帅指出，模型是使用离散特征还是连续特征，其实是一个“海量离散特征+简单模型” 同 “少量连续特征+复杂模型”的权衡。\n",
    "\n",
    "既可以离散化用线性模型，也可以用连续特征加深度学习。就看是喜欢折腾特征还是折腾模型了。\n",
    "\n",
    "通常来说，前者容易，而且可以n个人一起并行做，有成功经验；后者目前看很赞，能走多远还须拭目以待。\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "大概的理解：\n",
    "\n",
    "1）计算简单\n",
    "\n",
    "2）简化模型\n",
    "\n",
    "3）增强模型的泛化能力，不易受噪声的影响"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 并行LR的实现"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "由逻辑回归问题的求解方法中可以看出，无论是梯度下降法、牛顿法、拟牛顿法，计算梯度都是其最基本的步骤，并且L-BFGS通过两步循环计算牛顿方向的方法，避免了计算海森矩阵。\n",
    "\n",
    "因此逻辑回归的并行化最主要的就是对目标函数梯度计算的并行化。\n",
    "\n",
    "从公式(2)中可以看出，目标函数的梯度向量计算中只需要进行向量间的点乘和相加，可以很容易将每个迭代过程拆分成相互独立的计算步骤，由不同的节点进行独立计算，然后归并计算结果。\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "将M个样本的标签构成一个M维的标签向量，M个N维特征向量构成一个M*N的样本矩阵"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "如果将样本矩阵按行划分，将样本特征向量分布到不同的计算节点，由各计算节点完成自己所负责样本的点乘与求和计算，然后将计算结果进行归并，则实现了“按行并行的LR”。\n",
    "\n",
    "按行并行的LR解决了样本数量的问题，但是实际情况中会存在针对高维特征向量进行逻辑回归的场景（如广告系统中的特征维度高达上亿），仅仅按行进行并行处理，无法满足这类场景的需求，因此还需要按列将高维的特征向量拆分成若干小的向量进行求解。"
   ],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "source": [],
    "metadata": {
     "collapsed": false
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}